A Survey of Open Source Data Mining Systems
نویسندگان
چکیده
Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterprise can easily initiate a data mining project using the most current technology. Often the software is available at no cost, allowing the enterprise to instead focus on ensuring their staff can freely learn the data mining techniques and methods. Open source ensures that staff can understand exactly how the algorithms work by examining the source code, if they so desire, and can also fine tune the algorithms to suit the specific purposes of the enterprise. However, diversity, instability, scalability and poor documentation can be major concerns in using open source data mining systems. In this paper, we survey open source data mining systems currently available on the Internet. We compare 12 open source systems against several aspects such as general characteristics, data source accessibility, data mining functionality, and usability. We discuss advantages and disadvantages of these open source data mining systems.
منابع مشابه
Data Mining User Activity in Free and Open Source Software (FOSS)/ Open Learning Management Systems
Free and Open Source Software (FOSS)/Open Educational Systems development projects abound in higher education today. Many universities worldwide have adopted open source software like ATutor and Moodle as an alternative to commercial or homegrown systems. The move to open source learning management systems entails many special considerations, including usage analysis facilities. The tracking of...
متن کاملSpatial modelling of zonality elements based on compositional nature of geochemical data using geostatistical approach: a case study of Baghqloom area, Iran
Due to the existence of a constant sum of constraints, the geochemical data is presented as the compositional data that has a closed number system. A closed number system is a dataset that includes several variables. The summation value of variables is constant, being equal to one. By calculating the correlation coefficient of a closed number system and comparing it with an open number system, ...
متن کاملA Comparative Analysis of Data Mining Tools in Agent Based Systems
World wide technological advancement has brought in a widespread change in adoption and utilization of open source tools. Since, most of the organizations across the globe deal with a large amount of data to be updated online and transactions are made every second, managing, mining and processing this dynamic data is very complex. Successful implementation of the data mining technique requires ...
متن کاملSports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey
In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mini...
متن کاملA survey of open source data science tools
Purpose – Data science is the study of the generalizable extraction of knowledge from data. It includes a variety of components and develops on methods and concepts from many domains, containing mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization and data warehousing aiming to extract va...
متن کامل